Search CORE

3 research outputs found

Program Similarity Analysis for Malware Classification and its Pitfalls

Author: Marastoni Niccolo'
Publication venue
Publication date: 01/01/2021
Field of study

Malware classification, specifically the task of grouping malware samples into families according to their behaviour, is vital in order to understand the threat they pose and how to protect against them. Recognizing whether one program shares behaviors with another is a task that requires semantic reasoning, meaning that it needs to consider what a program actually does. This is a famously uncomputable problem, due to Rice\u2019s theorem. As there is no one-size-fits-all solution, determining program similarity in the context of malware classification requires different tools and methods depending on what is available to the malware defender. When the malware source code is readily available (or at least, easy to retrieve), most approaches employ semantic \u201cabstractions\u201d, which are computable approximations of the semantics of the program. We consider this the first scenario for this thesis: malware classification using semantic abstractions extracted from the source code in an open system. Structural features, such as the control flow graphs of programs, can be used to classify malware reasonably well. To demonstrate this, we build a tool for malware analysis, R.E.H.A. which targets the Android system and leverages its openness to extract a structural feature from the source code of malware samples. This tool is first successfully evaluated against a state of the art malware dataset and then on a newly collected dataset. We show that R.E.H.A. is able to classify the new samples into their respective families, often outperforming commercial antivirus software. However, abstractions have limitations by virtue of being approximations. We show that by increasing the granularity of the abstractions used to produce more fine-grained features, we can improve the accuracy of the results as in our second tool, StranDroid, which generates fewer false positives on the same datasets. The source code of malware samples is not often available or easily retrievable. For this reason, we introduce a second scenario in which the classification must be carried out with only the compiled binaries of malware samples on hand. Program similarity in this context cannot be done using semantic abstractions as before, since it is difficult to create meaningful abstractions from zeros and ones. Instead, by treating the compiled programs as raw data, we transform them into images and build upon common image classification algorithms using machine learning. This led us to develop novel deep learning models, a convolutional neural network and a long short-term memory, to classify the samples into their respective families. To overcome the usual obstacle of deep learning of lacking sufficiently large and balanced datasets, we utilize obfuscations as a data augmentation tool to generate semantically equivalent variants of existing samples and expand the dataset as needed. Finally, to lower the computational cost of the training process, we use transfer learning and show that a model trained on one dataset can be used to successfully classify samples in different malware datasets. The third scenario explored in this thesis assumes that even the binary itself cannot be accessed for analysis, but it can be executed, and the execution traces can then be used to extract semantic properties. However, dynamic analysis lacks the formal tools and frameworks that exist in static analysis to allow proving the effectiveness of obfuscations. For this reason, the focus shifts to building a novel formal framework that is able to assess the potency of obfuscations against dynamic analysis. We validate the new framework by using it to encode known analyses and obfuscations, and show how these obfuscations actually hinder the dynamic analysis process

Catalogo dei prodotti della ricerca

Revealing Similarities in Android Malware by Dissecting their Methods

Author: Dalla Preda Mila
Marastoni Niccolo'
Pasetto Michele
Publication venue
Publication date: 01/01/2020
Field of study

One of the most challenging problems in the fight against Android malware is finding a way to classify them according to their behavior, in order to be able to utilize previously gathered knowledge in analysis and prevention. In this paper we introduce a novel technique that discovers similarities between Android malware samples by comparing fragments of executed traces (strands) generated from their most suspect methods. This way we can accurately pinpoint which (possibly) malicious behaviors are shared between these different samples, allowing for easier analysis and classification. We implement this approach in a tool, StrAndroid, that we evaluate on a few dataset of malware and ransomware samples, comparing its results to an existing similarity too

Crossref

Catalogo dei prodotti della ricerca

Mining Totally Ordered Sequential Rules to Provide Timely Recommendations

Author: Dalla Vecchia Anna
Marastoni Niccolo
Migliorini Sara
Oliboni Barbara
Quintarelli Elisa
Publication venue
Publication date: 01/01/2023
Field of study

In this paper we show the importance of mining totally ordered sequential rules, and in particular we propose an extension of sequential rules where not only the antecedent precedes the consequent, but their itemsets are labelled with an explicit representation of their relative order. This allows us to provide more precise timely recommendations. Our technique has been applied to a real-world scenario regarding the provision of tailored suggestions for supermarket shopping activities

Catalogo dei prodotti della ricerca